An MDL Approach to Efficiently Discover Communities in Bipartite Network1

نویسندگان

  • Kaikuo Xu
  • Changjie Tang
  • Chuan Li
  • Yexi Jiang
  • Rong Tang
چکیده

Bipartite network is a branch of complex network. It is widely used in many applications such as social network analysis, collaborative filtering and information retrieval. Partitioning a bipartite network into smaller modules helps to get insight of the structure of the bipartite network. The main contributions of this paper include: (1) proposing an MDL 21 criterion for identifying a good partition of a bipartite network. (2) presenting a greedy algorithm based on combination theory, named as MDL-greedy, to approach the optimal partition of a bipartite network. The greedy algorithm automatically searches for the number of partitions, and requires no user intervention. (3) conducting experiments on synthetic datasets and the southern women dataset. The results show that our method generates higher quality results than the stateof-art methods Cross-Association and Information-theoretic co-clustering. Experiment results also show the good scalability of the proposed algorithm. The highest improvement could be up to about 14% for the precision, 40% for the ratio and 70% for the running time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An MDL Approach to Efficiently Discover Communities in Bipartite Network

Bipartite network is a branch of complex network. It is widely used in many applications such as social network analysis, collaborative filtering and information retrieval. Partitioning a bipartite network into smaller modules helps to get insight of the structure of the bipartite network. The main contributions of this paper include: (1) proposing an MDL 21 criterion for identifying a good par...

متن کامل

Designing an Ontology for Knowledge Discovery in Iran’s Vaccine

Ontology is a requirement engineering product and the key to knowledge discovery. It includes the terminology to describe a set of facts, assumptions, and relations with which the detailed meanings of vocabularies among communities can be determined. This is a qualitative content analysis research. This study has made use of ontology for the first time to discover the knowledge of vaccine in Ir...

متن کامل

Hierarchical, Parameter-Free Community Discovery

Given a large bipartite graph (like document-term, or userproduct graph), how can we find meaningful communities, quickly, and automatically? We propose to look for community hierarchies, with communities-within-communities. Our proposed method, the Context-specific Cluster Tree (CCT) finds such communities at multiple levels, with no user intervention, based on information theoretic principles...

متن کامل

Solving a nurse rostering problem considering nurses preferences by graph theory approach

Nurse Rostering Problem (NRP) or the Nurse Scheduling Problem (NSP) is a complex scheduling problem that affects hospital personnel on a daily basis all over the world and is known to be NP-hard.The problem is to decide which members of a team of nurses should be on duty at any time, during a rostering period of, typically, one month.It is very important to efficiently utilize time and effort, ...

متن کامل

Numeric Law Discovery using Neural Networks

This paper proposes a new connectionist approach to numeric law discovery; i.e., neural networks (law-candidates) are trained by using a newly invented second-order learning algorithm based on a quasi-Newton method, called BPQ, and the MDL criterion selects the most suitable from law-candidates. The main advantage of our method over previous work of symbolic or connectionist approach is that it...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009